parser_json: use JSON as fallback parser instead of Yajl for performance#4813
parser_json: use JSON as fallback parser instead of Yajl for performance#4813daipom merged 1 commit intofluent:masterfrom
Conversation
|
we are happy if we could observe micro benchmark result. |
Micro benchmarkbenchmark coderequire 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'benchmark-ips'
gem 'oj'
gem 'json'
gem 'yajl-ruby', require: 'yajl'
end
Benchmark.ips do |x|
json = '{"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]}'
x.report('Oj.load') { Oj.load(json) }
x.report('JSON.load') { JSON.load(json) }
x.report('Yajl.load') { Yajl.load(json) }
x.compare!
endresult |
2078284 to
9b3887f
Compare
|
By the way, which json gem version did you tested with? |
I tested with json v2.9.1 (latest version), it is default version in Ruby 3.4.1 If we will use Ruby 3.4 or later in the next fluent-package LTS, hmmm, I think we don't need to specify the version... |
9968af9 to
bfc027e
Compare
|
NOTE: will not be backported to v1.16 |
Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
bfc027e to
aa67069
Compare
daipom
left a comment
There was a problem hiding this comment.
LGTM. Thanks!!
Waiting CI.
|
I created separated PR in #4817 |
**Which issue(s) this PR fixes**: Fixes # **What this PR does / why we need it**: Ref. #4813 (comment) Fix wrong usage of `JSON.load` method. We should use `JSON.parse` instead. JSON.load performs unnecessary deserialisation. ``` irb(main):001> JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello" irb(main):002> JSON.parse('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => {"json_class" => "String", "raw" => [72, 101, 108, 108, 111]} ``` **Docs Changes**: **Release Note**: Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
…rmance (#4835) **Which issue(s) this PR fixes**: Fixes # **What this PR does / why we need it**: Recently, Ruby's json has incredible performance improvements. It might be faster than oj gem. So, I think json is a suitable as fallback. This is similar with #4813 * Before * It spent 70.708872503 sec to handle 10 GB file * After * It spent 54.428017716 sec to handle 10 GB file * config ``` <source> @type tail path "#{File.expand_path('~/tmp/fluentd/access*.log')}" pos_file "#{File.expand_path('~/tmp/fluentd/access.log.pos')}" tag log read_from_head true <parse> @type none </parse> </source> <match **> @type file path "#{File.expand_path('~/tmp/fluentd/output/log')}" # use formatter_json format json </match> ``` **Docs Changes**: **Release Note**: --------- Signed-off-by: Shizuo Fujita <fujita@clear-code.com>
|
Following this, can yajl-ruby be a development_dependency, so that it is not installed by default? |
|
The json gem cannot handle stream-based JSON using IO objects, so we still need |
Could you clarify what you mean by streaming here? Is it SAX style parsing ? Or JSONL ? Other ? I tried searching for YAJL use in that repo, but saw nothing like that. |
|
@byroot See in_forward.rb for example. The input is read from a socket in chunks. |
Ah, sorry. It is just IO stream. |
This seem to be the code in question: fluentd/lib/fluent/plugin/in_forward.rb Lines 250 to 257 in d63d193 But just looking at the code, I'm not 100% sure I get what the format it. Seems to be JSONL ? e.g. {"foo": 1}
{"foo": 2}Is that correct? |
|
For example, we parse a JSON from an IO stream in the following. fluentd/lib/fluent/plugin/parser_json.rb Lines 96 to 102 in d63d193 |
|
@Watson1978 yes, I saw that, but just the code doesn't really help me. What I'd like to see is the content of that |
|
@byroot We have simple test case for that in fluentd/test/plugin/test_parser_json.rb Lines 146 to 168 in d63d193 require 'yajl'
y = Yajl::Parser.new
y.on_parse_complete = ->(record){
puts "Finished parsing: #{record}"
}
rio, wio = IO.pipe
wio.write '{"message": "hello"}'
wio.close
y.parse(rio)
rio.close |
|
Hum, right, that that's a single JSON document. What I'm trying to understand is whether it's expected that the pipe may contain multiple documents one after the other. That's what I think I understand, but I'll try to play a bit more with YAJL, to see what it offer. |
Which issue(s) this PR fixes:
Fixes #
What this PR does / why we need it:
Recently, Ruby's json has incredible performance improvements.
It might be faster than oj gem.
So, I think json is a suitable as fallback.
This is similar with #4759
Here is easily benchmark. (I used same log file in #4759)
Before
After
config
FYI)
Docs Changes:
fluent/fluentd-docs-gitbook#560
Release Note: